Method and system for efficient hashing optimized for hardware accelerated caching

ABSTRACT

A system and method for data caching are provided. The disclosed method includes organizing a plurality of hash slots into a plurality of hash slot buckets such that each hash slot bucket in the plurality of hash slot buckets contains a plurality of hash slots having Logical Block Addressing (LBA) and Cache Segment ID (CSID) pairs, receiving an Input/Output (I/O) request from a host system, determining that cache memory is needed to fulfill the I/O request, and performing a cache lookup in connection with fulfilling the I/O request, where the cache lookup includes analyzing the plurality of hash slots for unoccupied hash slots by comparing a hash against hash values assigned to the hash slot buckets instead of individual hash values assigned to the hash slots.

CROSS-REFERENCE TO RELATED APPLICATIONS

This Non-Provisional patent Application claims the benefit of U.S.Provisional Patent Application No. 62/410,752, filed Oct. 20, 2016, theentire disclosure of which is hereby incorporated herein by reference.

FIELD OF THE DISCLOSURE

The present disclosure is generally directed toward computer memory.

BACKGROUND

Hashing is one of the basic frameworks utilized in a cachingenvironment. The advantage of hashing for cache lookup is that in thebest case the lookup will be O(1) when there are no collisions. But whenthere are collisions, the average or worst case lookup can be very timeconsuming because of the memory accesses involved.

There is a need for an effective hashing function which can make surethat the hash slots are used optimally. A region in each virtual diskshould not map to the same slot otherwise it has tendency to get intocollision list. For example: Logical Block Address (LBA) 0, whichcontains the master boot record, is frequently accessed. This means thatif LBA 0 of every virtual disk maps to the same hash slot, it is morelikely that most of these LBAs are frequently accessed and hence lead tothe collision list.

When caching is implemented in hardware, the load store operations ofmemory locations determine the performance that can be achieved by thesolution. An effective hash structure organization would be useful sothat the load store operations are minimized. Typically, in a hardwareaccelerated solution, the division operation is very expensive and it isnot preferred to add division logic to the hardware if such hardware canbe avoided.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is described in conjunction with the appendedfigures, which are not necessarily drawn to scale:

FIG. 1 is a block diagram depicting a computing system in accordancewith at least some embodiments of the present disclosure;

FIG. 2 is a block diagram depicting details of an illustrativecontroller in accordance with at least some embodiments of the presentdisclosure;

FIG. 3A is a block diagram depicting additional details of hash slotbuckets and the slots contained therein in accordance with at least someembodiments of the present disclosure;

FIG. 3B is a block diagram depicting additional details of a hashextension in accordance with at least some embodiments of the presentdisclosure;

FIG. 4 is a flow diagram depicting a method of adding to a local hashvalue in accordance with at least some embodiments of the presentdisclosure;

FIG. 5A is a first portion of a flow diagram depicting a hash searchingmethod in accordance with at least some embodiments of the presentdisclosure;

FIG. 5B is a second portion of the flow diagram from FIG. 5A; and

FIG. 6 is a flow diagram depicting a method of approximating a number ofrows needed to accommodate an Input/Output (I/O) operation in accordancewith at least some embodiments of the present disclosure.

DETAILED DESCRIPTION

The ensuing description provides embodiments only, and is not intendedto limit the scope, applicability, or configuration of the claims.Rather, the ensuing description will provide those skilled in the artwith an enabling description for implementing the described embodiments.It being understood that various changes may be made in the function andarrangement of elements without departing from the spirit and scope ofthe appended claims.

Unless otherwise defined, all terms (including technical and scientificterms) used herein have the same meaning as commonly understood by oneof ordinary skill in the art to which this disclosure belongs. It willbe further understood that terms, such as those defined in commonly useddictionaries, should be interpreted as having a meaning that isconsistent with their meaning in the context of the relevant art andthis disclosure.

As used herein, the singular forms “a,” “an,” and “the” are intended toinclude the plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprise,”“comprises,” and/or “comprising,” when used in this specification,specify the presence of stated features, integers, steps, operations,elements, and/or components, but do not preclude the presence oraddition of one or more other features, integers, steps, operations,elements, components, and/or groups thereof. The term “and/or” includesany and all combinations of one or more of the associated listed items.

As will be discussed in further detail herein, embodiments of thepresent disclosure present a hashing method for efficient cache lookupthat minimizes memory accesses. A new hash with a chaining scheme isintroduced with a hash table (e.g., a deep DRAM hash table) that is usedfor the cache lookup. In some embodiments, multi-level hash functionsare used—one level to get to an appropriate hash slot bucket and anotherlevel to get the appropriate hash slot within the hash slot bucket incall all slots are occupied.

A unique hash structure is also provided that can aid in set associativehashing scheme. The proposed hash structure also enables a method to addelements to a hash, while ensuring that all slots in a hash slot bucketare equally loaded.

Another aspect of the present disclosure is to provide a method forapproximating the number of slots to load for an R5/R6 Input/Output(I/O) operation.

With reference to FIGS. 1-6, various embodiments of the presentdisclosure will be described. While many of the examples depicted anddescribed herein will relate to a RAID architecture, it should beappreciated that embodiments of the present disclosure are not solimited. Indeed, aspects of the present disclosure can be used in anytype of computing system and/or memory environment. In particular,embodiments of the present disclosure can be used in any type of cachingscheme (whether employed by a RAID controller or some other type ofdevice used in a communication system). In particular, hard drives, harddrive controllers (e.g., SCSI controllers, SAS controllers, or RAIDcontrollers) may be configured to implement embodiments of the presentdisclosure. As another example, network cards or the like having cachememory may also be configured to implement embodiments of the presentdisclosure.

With reference now to FIG. 1, additional details of a computing system100 capable of implementing hashing methods and various cache lookuptechniques will be described in accordance with at least someembodiments of the present disclosure. The computing system 100 is shownto include a host system 104, a controller 108 (e.g., a SCSI controller,a SAS controller, a RAID controller, etc.), and a storage array 112having a plurality of storage devices 136 a-N therein. The system 100may utilize any type of data storage architecture. The particulararchitecture depicted and described herein (e.g., a RAID architecture)should not be construed as limiting embodiments of the presentdisclosure. If implemented as a RAID architecture, however, it should beappreciated that any type of RAID scheme may be employed (e.g., RAID-0,RAID-1, RAID-2, . . . , RAID-5, RAID-6, etc.).

In a RAID-0 (also referred to as a RAID level 0) scheme, data blocks arestored in order across one or more of the storage devices 136 a-Nwithout redundancy. This effectively means that none of the data blocksare copies of another data block and there is no parity block to recoverfrom failure of a storage device 136. A RAID-1 (also referred to as aRAID level 1) scheme, on the other hand, uses one or more of the storagedevices 136 a-N to store a data block and an equal number of additionalmirror devices for storing copies of a stored data block. Higher levelRAID schemes can further segment the data into bits, bytes, or blocksfor storage across multiple storage devices 136 a-N. One or more of thestorage devices 136 a-N may also be used to store error correction orparity information.

A single unit of storage can be spread across multiple devices 136 a-Nand such a unit of storage may be referred to as a stripe. A stripe, asused herein and as is well known in the data storage arts, may includethe related data written to multiple devices 136 a-N as well as theparity information written to a parity storage device 136 a-N. In aRAID-5 (also referred to as a RAID level 5) scheme, the data beingstored is segmented into blocks for storage across multiple devices 136a-N with a single parity block for each stripe distributed in aparticular configuration across the multiple devices 136 a-N. Thisscheme can be compared to a RAID-6 (also referred to as a RAID level 6)scheme in which dual parity blocks are determined for a stripe and aredistributed across each of the multiple devices 136 a-N in the array112.

One of the functions of the controller 108 is to make the multiplestorage devices 136 a-N in the array 112 appear to a host system 104 asa single high capacity disk drive. Thus, the controller 108 may beconfigured to automatically distribute data supplied from the hostsystem 104 across the multiple storage devices 136 a-N(potentially withparity information) without ever exposing the manner in which the datais actually distributed to the host system 104.

In the depicted embodiment, the host system 104 is shown to include aprocessor 116, an interface 120, and memory 124. It should beappreciated that the host system 104 may include additional componentswithout departing from the scope of the present disclosure. The hostsystem 104, in some embodiments, corresponds to a user computer, laptop,workstation, server, collection of servers, or the like. Thus, the hostsystem 104 may or may not be designed to receive input directly from ahuman user.

The processor 116 of the host system 104 may include a microprocessor,central processing unit (CPU), collection of microprocessors, or thelike. The memory 124 may be designed to store instructions that enablefunctionality of the host system 104 when executed by the processor 116.The memory 124 may also store data that is eventually written by thehost system 104 to the storage array 112. Further still, the memory 124may be used to store data that is retrieved from the storage array 112.Illustrative memory 124 devices may include, without limitation,volatile or non-volatile computer memory (e.g., flash memory, RAM, DRAM,ROM, EEPROM, etc.).

The interface 120 of the host system 104 enables the host system 104 tocommunicate with the controller 108 via a host interface 128 of thecontroller 108. In some embodiments, the interface 120 and hostinterface(s) 128 may be of a same or similar type (e.g., utilize acommon protocol, a common communication medium, etc.) such that commandsissued by the host system 104 are receivable at the controller 108 anddata retrieved by the controller 108 is transmittable back to the hostsystem 104. The interfaces 120, 128 may correspond to parallel or serialcomputer interfaces that utilize wired or wireless communicationchannels. The interfaces 120, 128 may include hardware that enables suchwired or wireless communications. The communication protocol usedbetween the host system 104 and the controller 108 may correspond to anytype of known host/memory control protocol. Non-limiting examples ofprotocols that may be used between interfaces 120, 128 include SAS,SATA, SCSI, FibreChannel (FC), iSCSI, ATA over Ethernet, InfiniBand, orthe like.

The controller 108 may provide the ability to represent the entirestorage array 112 to the host system 104 as a single high volume datastorage device. Any known mechanism can be used to accomplish this task.The controller 108 may help to manager the storage devices 136 a-N(which can be hard disk drives, sold-state drives, or combinationsthereof) so as to operate as a logical unit. In some embodiments, thecontroller 108 may be physically incorporated into the host device 104as a Peripheral Component Interconnect (PCI) expansion (e.g., PCIexpress (PCI)e) card or the like. In such situations, the controller 108may be referred to as a RAID adapter.

The storage devices 136 a-N in the storage array 112 may be of similartypes or may be of different types without departing from the scope ofthe present disclosure. The storage devices 136 a-N may be co-locatedwith one another or may be physically located in different geographicallocations. The nature of the storage interface 132 may depend upon thetypes of storage devices 136 a-N used in the storage array 112 and thedesired capabilities of the array 112. The storage interface 132 maycorrespond to a virtual interface or an actual interface. As with theother interfaces described herein, the storage interface 132 may includeserial or parallel interface technologies. Examples of the storageinterface 132 include, without limitation, SAS, SATA, SCSI, FC, iSCSI,ATA over Ethernet, InfiniBand, or the like.

The controller 108 is shown to have communication capabilities with acontroller cache 140. While depicted as being separate from thecontroller 108, it should be appreciated that the controller cache 140may be integral to the controller 108, meaning that components of thecontroller 108 and the controller cache 140 may be contained within asingle physical housing or computing unit (e.g., server blade). Thecontroller cache 140 is provided to enable the controller 108 to performcaching operations. The controller 108 may employ caching operationsduring execution of I/O commands received from the host system 104.Depending upon the nature of the I/O command and the amount ofinformation being processed during the command, the controller 108 mayrequire a large number of cache memory modules 148 or a smaller numberof cache memory modules 148. The memory modules 148 may correspond toflash memory, RAM, DDR memory, or some other type of computer memorythat is quickly accessible and can be rewritten multiple times. Thenumber of separate memory modules 148 in the controller cache 140 istypically larger than one, although a controller cache 140 may beconfigured to operate with a single memory module 148 if desired.

The cache interface 144 may correspond to any interconnect that enablesthe controller 108 to access the memory modules 148, temporarily storedata thereon, and/or retrieve data stored thereon in connection withperforming an I/O command or some other executable command. In someembodiments, the controller cache 140 may be integrated with thecontroller 108 and may be executed on a CPU chip or placed on a separatechip within the controller 108. In such a scenario, the interface 144may correspond to a separate bus interconnect within the CPU or tracesconnecting a chip of the controller cache 140 with a chip executing theprocessor of the controller 108. In other embodiments, the controllercache 140 may be external to the controller 108 in which case theinterface 144 may correspond to a serial or parallel data port.

Direct mapping is one type of cache configuration in which each block ismapped to exactly one cache location. Conceptually, this is like rows ina table with three columns: the data block or cache line that containsthe actual data fetched and stored, a tag that contains all or part ofthe address of the fetched data, and a flag bit that connotes thepresence of a valid bit of data in the row entry.

Fully associative mapping is similar to direct mapping in structure, butallows a block to be mapped to any cache location rather than to apre-specified cache location (as is the case with direct mapping).

Set associative mapping can be viewed as a compromise between directmapping and fully associative mapping in which each block is mapped to asubset of cache locations. It is sometimes called N-way set associativemapping, which provides for a location in main memory to be cached toany of “N” locations in the L1 cache.

As will be discussed in further detail herein, the controller 108 andcontroller cache 140 may be enabled to implement an N-way set associatedmapping in a more efficient manner than previous caching systems. Inparticular, the controller 108 and controller cache 140 may beconfigured to use hashing techniques that are N-time associative. Thiswill effectively control the number of hash collisions, which can helpincrease processing speed even though different inputs may result in thesame hash output. Although this particular type of hash strategy will bedescribed herein, it should be appreciated that any hash type can beused, including relatively simple hash types that potentially result inthe same hash outputs using different hash inputs.

With reference now to FIG. 2 additional details of a controller 108 willbe described in accordance with at least some embodiments of the presentdisclosure. The controller 108 is shown to include the host interface(s)128 and storage interface(s) 132. The controller 108 is also shown toinclude a processor 204, memory 208, one or more drivers 212, and apower source 216.

The processor 204 may include an Integrated Circuit (IC) chip ormultiple IC chips, a CPU, a microprocessor, or the like. The processor204 may be configured to execute instructions in memory 208 that areshown to include a cache management module 224, hashing instructions228, hash search instructions 232, and a slot collision list.Furthermore, in connection with executing the cache management module224, the processor 204 may utilize a set associative hash 220 containinga plurality of hash buckets 240 a-M. The set associative hash 220 isshown to be maintained internally to the controller 108. It should beappreciated, however, that some or all of the set associative hash 220may be stored and/or maintained external to the controller 108.Alternatively or additionally, the set associative hash 220 may bestored or contained within memory 208 of the controller 108.

The memory 208 may be volatile and/or non-volatile in nature. Asindicated above, the memory 208 may include any hardware component orcollection of hardware components that are capable of storinginstructions and communicating those instructions to the processor 204for execution. Non-limiting examples of memory 208 include RAM, ROM,flash memory, EEPROM, variants thereof, combinations thereof, and thelike.

The instructions stored in memory 208 are shown to be differentinstruction sets, but it should be appreciated that the instructions canbe combined into a smaller number of instruction sets without departingfrom the scope of the present disclosure. The cache management module224, when executed, enable the processor 204 to receive I/O commandsfrom the host system 104, determine one or more caching operations thatneed to be performed in connection with executing the I/O command, andidentify which memory module(s) 148 should be used in connection withperforming the one or more caching operations. The cache managementmodule 224 may also be configured to invoke the other instruction setsin memory 208 to further perform such caching operations. The cachemanagement module 224 may also be configured to manage the setassociative hash 220 and the values assigned to various data fieldswithin the set associative hash 220. In particular, as hash buckets 240a-M are utilized, the cache management module 224 may be enabled toupdate the set associative hash 220 to reflect such utilizations. Thecache management module 224 may also be responsible for facilitatingcommunications between the controller 108 and the controller cache 140.

The hashing instructions 228, when executed, enable the processor 208 tocalculate hash output values based hash inputs. In some embodiments,hash inputs used by the hashing instructions 228 to calculate hashoutput values may include numbers assigned to particular hash buckets240 a-M, numbers assigned to hash slots within the hash buckets 240 a-M,memory addresses of memory modules 148, LBAs, storage time, dataretrieval time, strip/row numbers, a hash index, a maximum number ofbits needed to complete the I/O command, and other numbers derivedtherefrom. In some embodiments, the hashing instructions 228 mayimplement a hashing function that correspond to a linear hashingfunction. Use of a linear hash function can help ensure that hash slotsfor successive strips/rows of data are stored consecutively (therebyincreasing the efficiency with which such data can be later retrieved bythe controller 108). Use of a linear hash function is advantageousespecially in hardware accelerated caching solutions where the hardwarethread does not have direct access to the memory module 148 (e.g., SRAMor DRAM) where the hash slots are stored. These hash slots can be loadedwhen required and stored back when the objective is achieved (e.g., theI/O command is completed). Since an I/O command almost always results inconsecutive strips/rows, many slots can be loaded and stored together,thus ensuring that there are not too many unnecessary load and storeoperations. In some embodiments, the transpose operation of the hashfunction executed by the hashing instructions 228 ensures that a regionin each virtual disk should not map to the same slot in a hash bucket240 a-M, otherwise the slot will have a tendency to be added to the slotcollision list 236.

The hash search instructions 232, when executed, enable the processor208 to search the set associative hash 220 for hash collisions. As willbe described in further detail herein, as the controller 108 executesI/O commands, the controller 108 may utilize the hash searchinstructions 232 to see if a particular piece or set of data is alreadystored in the memory module 148. The hash search instructions 232 may beenabled to utilize the hashing instructions 228 to generate a hash forthe data being searched and may compare that hash to hashes stored inthe set associative hash 220. If collisions (e.g., matches) are found,the hash search instructions 232 and/or cache management module 224 maybe enabled to update the slot collision list 236 as will be described infurther detail herein.

With reference now to FIGS. 3A and 3B, details of a set associative hash220 used in accordance with embodiments of the present disclosure willbe described. The set associative hash 220 is shown to include aplurality of hash buckets 240 a-M linearly arranged. The set associativehash 220 may include up to M hash buckets, where M is an integer numbergreater than or equal to one.

Each hash slot bucket 240 is shown to include a plurality of hash slots304 a-d. Although the hash slot buckets 240 are shown to include fourhash slots 304 a-d, it should be appreciated that a hash slot bucket 240may be configured to include a greater or lesser number of hash slots.For instance, the hash slot bucket 240 may have two, three, four, five,. . . , ten, twenty, or more hash slots without departing from the scopeof the present disclosure.

Each hash slot 304 is shown to include a number of fields for storinginformation in connection with a hash generated for the slot 304. Thefields include, without limitation, a flag field 308, a Cache Segment ID(CSID) field 312, an LSB field 316, and an MSB field 320. The flag field308 may contain a number of bits that indicate various details or piecesof information related to the hash slot 304. For instance, the flags 308may be used to indicate whether data stored in the hash slot 304continues in another hash slot (e.g., by an appropriate indication inthe bit link field 324). As another non-limiting example, the flags 308may indicate the presence of a bad block in one of the LBA ranges thatmap into the corresponding slot 304.

The CSID field 312 provides an identification of a cache segment. Theidentification may be provided as an alphanumerical string. In someembodiments, adjacent slots may be assigned consecutively-numberedCSIDs. In other embodiments, CSIDs may be assigned randomly orpseudo-randomly. In any event, the CSID provides a substantially unique(e.g., unique to the data structure 220) identifier for a slot 304. Thismeans that any particular slot 304 will have a different CSID from allother slots 304 in the set associative hash 220.

The combination of the LSB field 316 and MSB field 320 may provide anaddress range in memory where a hash slot is stored. In particular, theidentifiers stored in the fields 316, 320 may identify an LBA of amemory module 148 or an LBA range of the memory module 148.Collectively, the fields 316, 320 may be referred to as a tag field. Insome embodiments, a 1 byte value in the tag field may indicate that thecache segment frame is used as a hash slot extension, which is shown inFIG. 3B. In some embodiments, each hash slot 304 is provisioned withfour potential collision lists. This effectively ensures that when thereis a hash hit and a new element is added to the hash index, the hashdoes not go into the collision list but instead into the next availableslot. The hash will only be added to the collision list 236 when allfour slots 304 of the hash bucket 240 are filled.

This set associative hashing ensures that the length of the collisionlist is reduced. For instance, assume there are 4 elements that neededto be added to a same hash slot bucket 240 then all the 4 are added intothe slots 304 a-d and there would not be any addition to the collisionlist 236. When a fifth element is to be added, an extension (as shown inFIG. 3B) that can hold up to 16 slots is created and the last index(e.g., the fourth index) in the hash slot 304 d updated with theextension ID, the information in the last index (e.g., the fourth index)is copied into the first index slot in the extension thus establishingthe chain. Now up to 16 more elements can be added to this extensionframe without adding a further collision list 236. After all the 16slots of the hash extension are filled, if an additional element needsto be added, then one more hash extension frame that can hold up to 16frames is created and it is chained to the last slot in the currentextension frame (thus creating a chain of hash slot extensions).Accordingly, if a 100 elements needs to be added, if there is not setassociative hashing employed then the worst case nonlinear traversal isgoing to be 100 to determine if there is a hash hit or not. But withthis approach, only four slots 304 in the hash bucket 240 would do alinear search, then one chain is loaded and 16 slots in the firstextension involves a linear search. In the same way there would bemaximum of 6 extensions with each one having a linear list of 16elements. Hence with this approach, in the worst case, there are 6non-linear traversals and 100 linear traversals. It should be noted thatlinear traversals do not involve loading a new memory which makes thetraversal faster unlike a non-linear traversal (e.g., a linked list)which involves loading of the next link and its corresponding memorybefore proceeding to the next element.

Hardware implementations can optimize comparisons in a linear list. Insome embodiments, the hash bucket 240 preserves the expendability of thechaining, but this feature kicks in only when unusual workloads make thechaining necessary (e.g., there are more than 4 collisions). The hashslot 304 also indicates if the bad block bit is set (e.g., via flag308). If the flag bit is set, it indicates that some/all of the cachesegments present in this hash slot 304 have bad blocks and hence thecaching algorithms can take appropriate actions based on the specificuse case—for example divert the command to a firmware module to takespecific actions.

As mentioned above, each consecutive strip/row can be found inconsecutive hash slot buckets. This effectively allows a group of memoryaccesses to be performed instead of multiple individual memory accesses.As used herein, a hash slot bucket corresponds to a group of positionsin a hash table and a hash slot corresponds to a particular position inthe hash table. A particular slot may be empty or contain an entry.

With reference now to FIG. 4, a method of adding elements to a hashassigned to a slot bucket 240 will be described in accordance with atleast some embodiments of the present disclosure. As can be appreciated,the hash search instructions 232 will search the hashes assigned to eachslot bucket 240 for matches. These hashes need to be updated asadditional slots in the hash slot bucket 240 are used and data is storedin corresponding addresses of cache memory 148.

The method begins when an I/O command is received at the controller 108from the host system 104 and then the controller 108 invokes the cachemanagement module 224 to search for an empty slot in the set associativehash 220. Once initiated, the cache management module 224 will set alast slot index equal to 3 (e.g., if there are 4 slots per hash bucket)(step 404). Then the cache management module 224 will set an empty slotindex (step 408). In this step, the cache management module 224 willfind an empty slot by searching hash slot buckets 240 starting at hashslot bucket 0 240 a until the last slot index is reached. At each h ashbucket, the cache manager 224 will check to see if the CSID field of thehash slot bucket is INVALID. This may indicate slot bucket is empty. Insome embodiments, the first hash index is obtained from the I/O commandalthough it can also be received from other sources. This first hashindex gives the hash slot bucket which contains four slots. The nextcheck is to see if any of the four slots 304 within the hash slot bucket240 are empty (step 412)(e.g., from 0 to 3). If empty, the slot will beused to store the data in connection with the received I/O command.

If the first slot bucket is empty, then the first available slot withinthe slot bucket is assigned to the I/O command and the strip/rowassociated with that slot will be used to store data in connection withthe I/O command. The cache management module 224 will also add the CSIDthat is assigned to the strip/row in connection to the IO command, tothe hash slot assigned to the hash slot bucket 240 (step 440). The cachemanagement module 224 will also update the appropriate fields of theslot 304 and bucket 240 to indicate that the assigned slot is not partof a slot chain. Thereafter, the method ends at step 444.

Referring back to step 408, if an empty slot is not found, the cachemanagement module 224 will check to see if the last slot index in thehash bucket has an extension (step 416). If an extension is found (step420), then the cache management module 224 will get the CSID from thelast slot and load the CSID into local memory. The cache managementmodule 224 will then start the linear search within the frame/slotextension (step 424). The method then reverts back to step 412 to see ifany empty slots are found within the extension.

If no extension is found in step 420, then the cache management module224 allocates a new cache frame/extension (step 428). In someembodiments, the allocated cache frame can be of any size and can beindexable by an identifier. As a non-limiting example, the cache framemay be a 128 byte cache frame.

The method further continues with the cache management module 224copying the last slot into the first slot in the newly-allocated frame(step 432). Then the cache management module 224 sets the tag field 316,320 to indicate that the newly-allocated frame corresponds to anextension (step 436). The cache management module 224 may further setthe empty slot index to an appropriate value to indicate that thenewly-allocated frame does not have any slots currently allocatedtherein. The method then proceeds to step 440.

With reference now to FIGS. 5A and 5B, additional details of a hashsearching method will be described in accordance with at least someembodiments of the present disclosure. The hash searching method isperformed to see if there is a hash hit or miss for a particular I/Ocommand.

The method begins by setting the slot index (step 504). In someembodiments, the slot index is set using a hash function. In morespecific embodiments, the hash function used to set the slot index isnot a “strong” hash function. Rather, a relatively weak hash function isused for simple manipulation of data (e.g., shift and add functions).This means that faster responses can be achieved, but at the expense ofhigher collision rates. However, because the hash slots are grouped intobuckets and because adjacent LBAs are assigned to consecutive buckets,the benefits of the fast I/O processing can be realized without havingto worry about the increased risk of collisions. Furthermore, becauseembodiments of the present disclosure contemplate using DRAM, the I/Ocommand can be executed relatively quickly because the data for any I/Ocommand is stored in consecutive memory locations. Moreover, all of theM buckets can be accessed in a single read access rather than having touse multiple accesses. This will be true as long as there are less thanfour collisions for each hash bucket (if each bucket has four slots).

In some embodiments, the get hash index algorithm computes a hash index(e.g., “HashIndex”) according to the following:

  Transp(VD=a₀a₁a₂a₃a₄a₅a₆a₇a₈a₉a_(a)a_(b)) =a_(b)a_(a)a₉a₈a₇a₆a₅a₄a₃a₂a₁a₀ Shift = maxBitsForHS-maxBitsInLdNumberflippedVD = (Transp(VD) << shift) HashIndex = (Strip_RowNumber +flippedVD) % numberOfHashSlotsWhere a_(b)a_(a)a₉a₈a₇a₆a₅a₄a₃a₂a₁a₀ is the virtual disk ID or number(e.g., TargetID), where maxBitsForHS is the number of bits required torepresent the hash table (e.g., if the hash table size is 1024, themaxBitsForHS value is 10 (2̂10=1−24)), where maxBitsInLdNumber is thenumber of bits required to represent all of the Virtual Disks (e.g., ifthe total number of virtual disks supported is 64, then this value is 6whereas for 2048, then the value is 11), where Transp(VD) is thetransposed value of the Virtual Disk Number, where shift is the numberof bits in Transp(VD) that need to be shifted, where flippedVD is theshifted Transp(VD) makes the hash sparse, and where Strip_RowNumber isthe strip number present in the I/O command for RAID 0 or RAID 1 volumeor row number for RAID 5, RAID 6 volumes.

After the slot index is found, the method proceeds with the cachemanagement module 224 and/or hash search instructions 232 determiningthe number of slots that need to be loaded to accommodate the I/Ocommand (step 508). In some embodiments, this number may be approximatedby the cache management module 224 and/or hash search instructions 232(as will be described in connection with FIG. 6). The slots then beginto be loaded (step 512). In some embodiments, the slots are loaded inalignment with a burst boundary. The burst boundary can depend upon theslot size, the burst size, and the number of slots included in a slotbucket. The slot loading begins with the first slot (step 516).

The cache management module 224 and/or hash search instructions 232 willdetermine whether the number of slots needed (e.g., as estimated ordetermined in step 508) is less than the total number of slots in abucket (step 520). If this query is answered negatively, then the methodproceeds by checking if the last slot has an extension (step 536). Ifthe last slot has an extension, then the cache management module 224and/or hash search instructions 232 get the ID of the frame from thelast slot index and load that particular frame into local memory foradditional searching (step 544). The cache management module 224 and/orhash search instructions 232 then update the last slot number and thelast slot index (step 548). After the slot number and last slot indexhave been updated, the method proceeds to step 528.

Referring back to step 540, if there is no extension present, then themethod proceeds to step 568 (FIG. 5B) as will be discussed in furtherdetail herein.

Referring back to step 520, if the slot number is not less than or equalto the last slot index, then the method proceeds by comparing thestrip/row from the slot to the strip/row for the request I/O command(step 524). Thereafter, the method proceeds with the cache managementmodule 224 and/or hash search instructions 232 determining if a hashmatch is found (step 528). If not, the method increments to the nextslot (step 532) and returns to step 520.

At step 552 and with reference to FIG. 5B, the method proceeds until aCSID match is found (step 552). Finding a matching CSID may be referredto as finding a hash hit. At this point, the required operation isperformed in connection with executing the I/O command (step 556). Insome embodiments, the local hash value may be stored in memory of thecontroller 108 or in the controller cache 140 (step 560). Thereafter,the method ends (step 564).

Referring back to the inquiry of step 540 (FIG. 5A) if there is noextension present, then the method proceeds with the cache managementmodule 224 and/or hash search instructions 232 performing the requiredoperation associated with a hash miss (step 568). The local hash canthen be stored in memory if needed (step 560) and then the method ends(step 564).

With reference now to FIG. 6, a method of approximating a number of rowsneeded to accommodate an I/O operation will be described in accordancewith at least some embodiments of the present disclosure. While forR0/R1 RAID architectures the number of hash slots to be loaded is equalto the number of strips spanned by the I/O command, for R5 the number ofhash slots needed for an I/O command corresponds to the number of rowsspanned by the I/O command. To find the number of rows often requires adivider function. Typically, in hardware accelerated solutions, thedivision operation is very expensive and it is not preferred to adddivision logic to the hardware if it can be avoided. Accordingly, if anR5 architecture it may be desirable to provide a method forapproximating the number of rows needed to accommodate an I/O operationwithout requiring a division operation.

Embodiments of the present disclosure contemplate such an approximationmethod. The method begins by determining the number of strips of thefirst row (NSFR) (step 604). This value of NSFR may be set equal to therow data size minus log Arm. The Log Arm is the Logical Arm in a RaidVolume. For A Raid 5 volume of 3 drives. There will be two data LogicalArms Log Arm 0 and Log Arm 1 and one parity arm. Data is striped acrossthese logical Arms.

An I/O command contains Row Number, Logical Arm Number and Offset in Armto represent the start LBA for a Raid 5 volume.RowNumber=startlba/number of Data Arms. Log Arm=(startLba/srtipsize) %Number of Data Arms. OffsetInArm=startLba % strip size. The methodproceeds by determining whether the NSFR is greater than the number ofstrips spanned by the I/O command (step 608). If the query is answeredpositively, then the NSFR is adjusted to equal the number of stripsspanned by the I/O operation (step 612). If, however, the query of step608 is answered negatively, then the method proceeds to determine therow data size, the number of rows, the number of full rows, and thenumber of strips spanned in the last row (step 616). The method furthercontinues by determining whether the number of strips spanned in thelast row is equal to zero (step 620). If this query is answeredaffirmatively, then the method proceeds to setting the number of rowsequal to the determined number of rows plus one (step 624). However, ifthe query of step 720 is answered negatively, then the method proceedsby approximating the number of rows spanned by the I/O operation (step628). In some embodiments, the nearest power of two value (NP2) of therow data size of the RAID volume is used to approximate the number ofrows spanned by the I/O operation.

Specific details were given in the description to provide a thoroughunderstanding of the embodiments. However, it will be understood by oneof ordinary skill in the art that the embodiments may be practicedwithout these specific details. In other instances, well-known circuits,processes, algorithms, structures, and techniques may be shown withoutunnecessary detail in order to avoid obscuring the embodiments.

While illustrative embodiments of the disclosure have been described indetail herein, it is to be understood that the inventive concepts may beotherwise variously embodied and employed, and that the appended claimsare intended to be construed to include such variations, except aslimited by the prior art.

What is claimed is:
 1. A method of data caching, the method comprising:organizing a plurality of hash slots into a plurality of hash slotbuckets such that each hash slot bucket in the plurality of hash slotbuckets contains a plurality of hash slots having Logical BlockAddressing (LBA) and Cache Segment ID (CSID) pairs; receiving anInput/Output (I/O) request from a host system; determining that cachememory is needed to fulfill the I/O request; and performing a cachelookup in connection with fulfilling the I/O request, wherein the cachelookup includes analyzing the plurality of hash slots for unoccupiedhash slots by comparing a hash against hash values assigned to the hashslot buckets instead of individual hash values assigned to the hashslots.
 2. The method of claim 1, wherein each hash slot bucket comprisesa same number of hash slots.
 3. The method of claim 2, wherein each hashslot bucket comprises at least four hash slots.
 4. The method of claim1, wherein the hash value for a hash slot bucket is determined, at leastin part, using the LBA or CSID of a hash slot contained within the hashslot bucket as an input to a hash function.
 5. The method of claim 1,further comprising: searching the plurality of hash slots andidentifying a hash slot bucket having at least one unoccupied hash slottherein; adding an element to the at least one unoccupied hash slotcontained within the identified hash slot bucket; adding a CSID of theat least one hash slot to which the element is added; and setting a nextand previous for the CSID of the at least one hash slot to an invalidvalue.
 6. The method of claim 1, further comprising: determining that anapproximation is needed for a number of rows spanned by the I/O request;and implementing an approximation routine in which the approximatenumber of rows spanned by the I/O request is determined by consideringat least two of the following: (i) a nearest power of two value of rowdata size of a RAID volume, (ii) a number of rows in the RAID volumethat are full, and (iii) a number of strips spanned in a last row of theRAID volume.
 7. The method of claim 1, further comprising: setting aslot index to an initial value; comparing a first hash value assigned toa first hash slot in a hash slot bucket with the hash, wherein the firsthash slot corresponds to a lowest numbered hash slot in the first hashslot bucket; determining that the hash does not match the first hashvalue; incrementing the slot index; comparing a next hash value assignedto a next hash slot in the hash slot bucket with the hash, wherein thenext hash slot corresponds to a lowest numbered hash slot in the firsthash slot bucket with the exception of the first hash slot and any hashslot already having its hash value compared with the hash; and repeatingthe incrementing and comparing of the next hash value with the hashuntil a match is found between the hash and the next hash value.
 8. Asystem, comprising: a controller cache comprising one or more memorymodules; and a controller situated between a host system and a datastorage device, the controller being configured to temporarily storedata in the one or more memory modules of the controller cache inconnection with processing Input/Output (I/O) requests received from thehost system, the controller further comprising: a processor; and memorycoupled to the processor, the memory including instructions that, whenexecuted by the processor, enable the controller to perform thefollowing: organize a plurality of hash slots into a plurality of hashslot buckets such that each hash slot bucket in the plurality of hashslot buckets contains a plurality of hash slots having Logical BlockAddressing (LBA) and Cache Segment ID (CSID) pairs; receive anInput/Output (I/O) request from the host system; determine that the oneor more memory modules are needed to fulfill the I/O request; andperform a cache lookup in connection with fulfilling the I/O request,wherein the cache lookup includes analyzing the plurality of hash slotsfor unoccupied hash slots by comparing a hash against hash valuesassigned to the hash slot buckets instead of individual hash valuesassigned to the hash slots.
 9. The system of claim 8, wherein each hashslot bucket comprises a same number of hash slots.
 10. The system ofclaim 9, wherein each hash slot bucket comprises at least four hashslots.
 11. The system of claim 8, wherein the hash value for a hash slotbucket is determined, at least in part, using the LBA or CSID of a hashslot contained within the hash slot bucket as an input to a hashfunction.
 12. The system of claim 8, wherein the instructions furtherenable the controller to: search the plurality of hash slots andidentify a hash slot bucket having at least one unoccupied hash slottherein; add an element to the at least one unoccupied hash slotcontained within the identified hash slot bucket; add a CSID of the atleast one hash slot to which the element is added; and set a next andprevious for the CSID of the at least one hash slot to an invalid value.13. The system of claim 8, wherein the instructions further enable thecontroller to: determine that an approximation is needed for a number ofrows spanned by the I/O request; and implement an approximation routinein which the approximate number of rows spanned by the I/O request isdetermined by considering at least two of the following: (i) a nearestpower of two value of row data size of a RAID volume, (ii) a number ofrows in the RAID volume that are full, and (iii) a number of stripsspanned in a last row of the RAID volume.
 14. The system of claim 8,wherein the instructions further enable the controller to: set a slotindex to an initial value; compare a first hash value assigned to afirst hash slot in a hash slot bucket with the hash, wherein the firsthash slot corresponds to a lowest numbered hash slot in the first hashslot bucket; determine that the hash does not match the first hashvalue; increment the slot index; compare a next hash value assigned to anext hash slot in the hash slot bucket with the hash, wherein the nexthash slot corresponds to a lowest numbered hash slot in the first hashslot bucket with the exception of the first hash slot and any hash slotalready having its hash value compared with the hash; and repeat theincrementing and comparing of the next hash value with the hash until amatch is found between the hash and the next hash value.
 15. Acontroller situated between a host system and a data storage array, thecontroller being configured to temporarily store data in cache memory inconnection with processing Input/Output (I/O) requests received from thehost system, the controller comprising: a processor; and memory coupledto the processor, the memory including instructions that, when executedby the processor, enable the controller to perform the following:organize a plurality of hash slots into a plurality of hash slot bucketssuch that each hash slot bucket in the plurality of hash slot bucketscontains a plurality of hash slots having Logical Block Addressing (LBA)and Cache Segment ID (CSID) pairs; receive an Input/Output (I/O) requestfrom the host system; determine that the cache memory is needed tofulfill the I/O request; and perform a cache lookup in connection withfulfilling the I/O request, wherein the cache lookup includes analyzingthe plurality of hash slots for unoccupied hash slots by comparing ahash against hash values assigned to the hash slot buckets instead ofindividual hash values assigned to the hash slots.
 16. The controller ofclaim 15, wherein each hash slot bucket comprises a same number of hashslots.
 17. The controller of claim 16, wherein each hash slot bucketcomprises at least four hash slots.
 18. The controller of claim 15,wherein the hash value for a hash slot bucket is determined, at least inpart, using the LBA or CSID of a hash slot contained within the hashslot bucket as an input to a hash function.
 19. The controller of claim15, wherein the instructions further enable the controller to: searchthe plurality of hash slots and identify a hash slot bucket having atleast one unoccupied hash slot therein; add an element to the at leastone unoccupied hash slot contained within the identified hash slotbucket; add a CSID of the at least one hash slot to which the element isadded; and set a next and previous for the CSID of the at least one hashslot to an invalid value.
 20. The controller of claim 8, wherein theinstructions further enable the controller to: determine that anapproximation is needed for a number of rows spanned by the I/O request;and implement an approximation routine in which the approximate numberof rows spanned by the I/O request is determined by considering at leasttwo of the following: (i) a nearest power of two value of row data sizeof a RAID volume, (ii) a number of rows in the RAID volume that arefull, and (iii) a number of strips spanned in a last row of the RAIDvolume.